mcadet6565@floridapoly.eduThe National Basketball Association (NBA) is one of the premier professional basketball leagues in the world. Analyzing team performance and statistics can provide insights into game strategies, player effectiveness, and overall team success. This report aims to explore various facets of NBA teams’ performance, focusing on key metrics and providing an in-depth analysis of team statistics.
The NBA consists of 30 teams divided into two conferences: the
Eastern Conference and the Western Conference. Each team plays an
82-game regular season, with the top eight teams from each conference
advancing to the playoffs. Winning an NBA championship is the ultimate
goal for any team (We are Champion - Celtics ☘️), requiring
not just skill but also effective strategies and consistent
performance.
This analysis utilizes data from the NBA champions dataset, which includes various statistics for NBA teams. We will use R for data manipulation, visualization, and statistical analysis. The key areas of focus will include per game statistics, advanced statistics, and shooting statistics, providing a comprehensive overview of team performance.
# load libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Warning: package 'sf' was built under R version 4.3.2
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(htmlwidgets)
library(broom)
# load data
file_path <- "https://raw.githubusercontent.com/reisanar/datasets/master/NBAchampionsdata.csv"
data <- read_csv(file_path)
## Rows: 220 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Team
## dbl (23): Year, Game, Win, Home, MP, FG, FGA, FGP, TP, TPA, TPP, FT, FTA, FT...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data
## # A tibble: 220 × 24
## Year Team Game Win Home MP FG FGA FGP TP TPA TPP
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1980 Lakers 1 1 1 240 48 89 0.539 0 0 NA
## 2 1980 Lakers 2 0 1 240 48 95 0.505 0 1 0
## 3 1980 Lakers 3 1 0 240 44 92 0.478 0 1 0
## 4 1980 Lakers 4 0 0 240 44 93 0.473 0 0 NA
## 5 1980 Lakers 5 1 1 240 41 91 0.451 0 0 NA
## 6 1980 Lakers 6 1 0 240 45 92 0.489 0 2 0
## 7 1981 Celtics 1 1 1 240 41 95 0.432 0 1 0
## 8 1981 Celtics 2 0 1 240 41 82 0.5 0 3 0
## 9 1981 Celtics 3 1 0 240 40 89 0.449 2 3 0.667
## 10 1981 Celtics 4 0 0 240 35 74 0.473 0 3 0
## # ℹ 210 more rows
## # ℹ 12 more variables: FT <dbl>, FTA <dbl>, FTP <dbl>, ORB <dbl>, DRB <dbl>,
## # TRB <dbl>, AST <dbl>, STL <dbl>, BLK <dbl>, TOV <dbl>, PF <dbl>, PTS <dbl>
Check the summary statistics and structure of the dataset, and look for missing values:
# Summary statistics
summary(data)
## Year Team Game Win
## Min. :1980 Length:220 Min. :1.0 Min. :0.0000
## 1st Qu.:1989 Class :character 1st Qu.:2.0 1st Qu.:0.0000
## Median :1999 Mode :character Median :3.0 Median :1.0000
## Mean :1999 Mean :3.4 Mean :0.7091
## 3rd Qu.:2009 3rd Qu.:5.0 3rd Qu.:1.0000
## Max. :2018 Max. :7.0 Max. :1.0000
##
## Home MP FG FGA
## Min. :0.0000 Min. :240.0 Min. :25.00 Min. : 62.00
## 1st Qu.:0.0000 1st Qu.:240.0 1st Qu.:33.00 1st Qu.: 75.00
## Median :1.0000 Median :240.0 Median :37.00 Median : 80.00
## Mean :0.5045 Mean :242.4 Mean :37.75 Mean : 80.88
## 3rd Qu.:1.0000 3rd Qu.:240.0 3rd Qu.:42.00 3rd Qu.: 87.00
## Max. :1.0000 Max. :315.0 Max. :56.00 Max. :130.00
##
## FGP TP TPA TPP
## Min. :0.2890 Min. : 0.000 Min. : 0.00 Min. :0.0000
## 1st Qu.:0.4298 1st Qu.: 2.000 1st Qu.: 6.75 1st Qu.:0.2500
## Median :0.4670 Median : 5.000 Median :15.00 Median :0.3585
## Mean :0.4665 Mean : 5.355 Mean :14.60 Mean :0.3422
## 3rd Qu.:0.5000 3rd Qu.: 8.000 3rd Qu.:20.00 3rd Qu.:0.4440
## Max. :0.6170 Max. :18.000 Max. :43.00 Max. :1.0000
## NA's :6
## FT FTA FTP ORB
## Min. : 5.00 Min. : 8.00 Min. :0.3680 Min. : 3.0
## 1st Qu.:15.00 1st Qu.:21.00 1st Qu.:0.6670 1st Qu.: 9.0
## Median :19.00 Median :26.00 Median :0.7400 Median :12.0
## Mean :19.93 Mean :27.13 Mean :0.7356 Mean :12.3
## 3rd Qu.:24.00 3rd Qu.:32.25 3rd Qu.:0.8157 3rd Qu.:15.0
## Max. :43.00 Max. :57.00 Max. :1.0000 Max. :27.0
##
## DRB TRB AST STL
## Min. :16.00 Min. :22.0 Min. :11.0 Min. : 1.000
## 1st Qu.:27.00 1st Qu.:38.0 1st Qu.:18.0 1st Qu.: 6.000
## Median :30.00 Median :42.0 Median :22.0 Median : 8.000
## Mean :30.20 Mean :42.5 Mean :22.5 Mean : 7.855
## 3rd Qu.:33.25 3rd Qu.:47.0 3rd Qu.:27.0 3rd Qu.:10.000
## Max. :44.00 Max. :59.0 Max. :44.0 Max. :18.000
##
## BLK TOV PF PTS
## Min. : 0.000 Min. : 4.00 Min. :12.00 Min. : 71.00
## 1st Qu.: 3.000 1st Qu.:11.00 1st Qu.:20.00 1st Qu.: 90.75
## Median : 5.000 Median :14.00 Median :23.00 Median :101.00
## Mean : 5.323 Mean :13.71 Mean :22.86 Mean :100.79
## 3rd Qu.: 7.000 3rd Qu.:16.00 3rd Qu.:26.00 3rd Qu.:109.00
## Max. :14.000 Max. :26.00 Max. :33.00 Max. :141.00
##
# Structure the dataset
str(data)
## spc_tbl_ [220 × 24] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Year: num [1:220] 1980 1980 1980 1980 1980 ...
## $ Team: chr [1:220] "Lakers" "Lakers" "Lakers" "Lakers" ...
## $ Game: num [1:220] 1 2 3 4 5 6 1 2 3 4 ...
## $ Win : num [1:220] 1 0 1 0 1 1 1 0 1 0 ...
## $ Home: num [1:220] 1 1 0 0 1 0 1 1 0 0 ...
## $ MP : num [1:220] 240 240 240 240 240 240 240 240 240 240 ...
## $ FG : num [1:220] 48 48 44 44 41 45 41 41 40 35 ...
## $ FGA : num [1:220] 89 95 92 93 91 92 95 82 89 74 ...
## $ FGP : num [1:220] 0.539 0.505 0.478 0.473 0.451 0.489 0.432 0.5 0.449 0.473 ...
## $ TP : num [1:220] 0 0 0 0 0 0 0 0 2 0 ...
## $ TPA : num [1:220] 0 1 1 0 0 2 1 3 3 3 ...
## $ TPP : num [1:220] NA 0 0 NA NA 0 0 0 0.667 0 ...
## $ FT : num [1:220] 13 8 23 14 26 33 16 8 12 16 ...
## $ FTA : num [1:220] 15 12 30 19 33 35 20 13 19 24 ...
## $ FTP : num [1:220] 0.867 0.667 0.767 0.737 0.788 0.943 0.8 0.615 0.632 0.667 ...
## $ ORB : num [1:220] 12 15 22 18 19 17 25 14 16 17 ...
## $ DRB : num [1:220] 31 37 34 31 37 35 29 34 28 30 ...
## $ TRB : num [1:220] 43 52 56 49 56 52 54 48 44 47 ...
## $ AST : num [1:220] 30 32 20 23 28 27 23 17 24 22 ...
## $ STL : num [1:220] 5 12 5 12 7 14 6 6 12 5 ...
## $ BLK : num [1:220] 9 7 5 6 6 4 5 7 6 6 ...
## $ TOV : num [1:220] 17 26 20 19 21 17 19 22 11 22 ...
## $ PF : num [1:220] 24 27 25 22 27 22 21 27 25 22 ...
## $ PTS : num [1:220] 109 104 111 102 108 123 98 90 94 86 ...
## - attr(*, "spec")=
## .. cols(
## .. Year = col_double(),
## .. Team = col_character(),
## .. Game = col_double(),
## .. Win = col_double(),
## .. Home = col_double(),
## .. MP = col_double(),
## .. FG = col_double(),
## .. FGA = col_double(),
## .. FGP = col_double(),
## .. TP = col_double(),
## .. TPA = col_double(),
## .. TPP = col_double(),
## .. FT = col_double(),
## .. FTA = col_double(),
## .. FTP = col_double(),
## .. ORB = col_double(),
## .. DRB = col_double(),
## .. TRB = col_double(),
## .. AST = col_double(),
## .. STL = col_double(),
## .. BLK = col_double(),
## .. TOV = col_double(),
## .. PF = col_double(),
## .. PTS = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
# Check for missing values
colSums(is.na(data))
## Year Team Game Win Home MP FG FGA FGP TP TPA TPP FT FTA FTP ORB
## 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0
## DRB TRB AST STL BLK TOV PF PTS
## 0 0 0 0 0 0 0 0
To analyze the home and away wins for the Boston Celtics:
# make columns numeric
data <- data %>%
mutate(across(c(Win, Home, DRB), as.numeric))
# Filter data for Celtics
celtics_data <- data %>% filter(Team == "Celtics")
# Summarize home and away wins
celtics_home_away <- celtics_data %>%
summarise(HomeWins = sum(ifelse(Home == 1 & Win == 1, 1, 0), na.rm = TRUE),
AwayWins = sum(ifelse(Home == 0 & Win == 1, 1, 0), na.rm = TRUE))
celtics_home_away
## # A tibble: 1 × 2
## HomeWins AwayWins
## <dbl> <dbl>
## 1 11 5
# convert columns to numeric
data <- data %>%
mutate(across(c(Win, Home, DRB, PTS), as.numeric))
# Filter data for Celtics - favorite basketball team
celtics_data <- data %>% filter(Team == "Celtics")
# Getting sum of total points by Celtics in home and away games
celtics_points <- celtics_data %>%
group_by(Home) %>%
summarise(TotalPoints = sum(PTS, na.rm = TRUE))
# Create interactive plot
p <- celtics_points %>%
ggplot(aes(x = factor(Home, labels = c("Away", "Home")), y = TotalPoints, fill = factor(Home))) +
geom_bar(stat = "identity") +
labs(
title = "Total Points by Celtics in Home and Away Games",
subtitle = "Total points scored by the Celtics in home and away games.",
x = "Game Location",
y = "Total Points"
) +
scale_fill_manual(values = c("Away" = "blue", "Home" = "red")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
legend.title = element_blank()
)
# Add interactive elements with plotly
interactive_plot <- ggplotly(p, tooltip = c("x", "y"))
# Customize hover text
interactive_plot <- interactive_plot %>%
layout(
hoverlabel = list(
bgcolor = "white",
bordercolor = "black",
font = list(size = 12)
)
)
# show plot
interactive_plot
# Save hrml plot
saveWidget(interactive_plot, file = "celtics_points_interactive.html")
# looking into specific team by locations and team colors
team_locations <- data.frame(
Team = c("Lakers", "Celtics", "Sixers"),
City = c("Los Angeles", "Boston", "Philadelphia"),
Latitude = c(34.0522, 42.3601, 39.9526),
Longitude = c(-118.2437, -71.0589, -75.1652),
Color = c("yellow", "green", "red")
)
# Convert to spatial
team_locations <- st_as_sf(team_locations, coords = c("Longitude", "Latitude"), crs = 4326)
# Get US map data
us_map <- map_data("state")
# plot
spatial_plot <- ggplot() +
geom_polygon(data = us_map, aes(x = long, y = lat, group = group), fill = "gray95", color = "black", linewidth = 0.2) +
geom_sf(data = team_locations, aes(geometry = geometry, fill = Team), size = 5, shape = 21, color = "black") +
scale_fill_manual(values = c("Lakers" = "yellow", "Celtics" = "green", "Sixers" = "red")) +
labs(title = "NBA Team Locations", x = "Longitude", y = "Latitude") +
theme_minimal() +
theme(
legend.position = "bottom",
legend.title = element_blank(),
plot.title = element_text(hjust = 0.5)
)
spatial_plot
# Save plot
ggsave("nba_team_locations.png", plot = spatial_plot, width = 10, height = 6)
In this section, we will build a linear model to explore the relationship between defensive rebounds (DRB) and points scored (PTS) by NBA teams. Understanding this relationship can help teams improve their defensive strategies to increase their chances of scoring and winning games.
# Ensure columns are numeric
data <- data %>%
mutate(across(c(Win, Home, DRB, PTS), as.numeric))
# Linear Model and Coefficients Plot
model <- lm(PTS ~ DRB, data = data)
model_summary <- summary(model)
# Get the coefficients and their confidence intervals
coef_data <- broom::tidy(model) %>%
mutate(term = recode(term, `(Intercept)` = "Intercept", `DRB` = "Defensive Rebounds"))
# Create interactive plot with hover annotations
coef_plot <- coef_data %>%
ggplot(aes(x = term, y = estimate, fill = term, text = paste(
"Term: ", term, "<br>",
"Estimate: ", round(estimate, 2), "<br>",
"Std Error: ", round(std.error, 2), "<br>",
"This plot shows the relationship between defensive rebounds and points scored.<br>",
"The bars represent the estimated coefficients, and the lines represent the uncertainty around these estimates."
))) +
geom_col(width = 0.6) +
geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.2) +
scale_fill_manual(values = c("Defensive Rebounds" = "#1f77b4", "Intercept" = "#ff7f0e")) +
labs(
title = "Impact of Defensive Rebounds on Points Scored",
subtitle = "Linear Model Coefficients with Error Bars",
x = "Model Terms",
y = "Coefficient Estimate"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5)
)
interactive_plot <- ggplotly(coef_plot, tooltip = "text") %>%
layout(margin = list(t = 80)) # Adjust top margin
# Residuals Plot
residuals_data <- augment(model)
residuals_plot <- residuals_data %>%
ggplot(aes(x = .fitted, y = .resid, text = paste(
"Fitted Value: ", round(.fitted, 2), "<br>",
"Residual: ", round(.resid, 2)
))) +
geom_point(color = "#1f77b4") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
title = "Residuals vs Fitted Values",
x = "Fitted Values",
y = "Residuals"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold")
)
interactive_residuals_plot <- ggplotly(residuals_plot, tooltip = "text") %>%
layout(margin = list(t = 80)) # Adjust top margin
# Save interactive plots in HTML format
saveWidget(interactive_plot, file = "model_coefficients_interactive.html")
saveWidget(interactive_residuals_plot, file = "residuals_plot_interactive.html")
# Show plots
interactive_plot
interactive_residuals_plot
# Display model summary
model_summary
##
## Call:
## lm(formula = PTS ~ DRB, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.552 -10.032 -0.201 8.911 42.000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 87.9413 5.5924 15.725 <2e-16 ***
## DRB 0.4253 0.1828 2.326 0.0209 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.18 on 218 degrees of freedom
## Multiple R-squared: 0.02423, Adjusted R-squared: 0.01975
## F-statistic: 5.412 on 1 and 218 DF, p-value: 0.02091
Motivation for the Model: The motivation behind predicting points scored (PTS) based on defensive rebounds (DRB) lies in understanding how defensive actions contribute to offensive success. By analyzing this relationship, teams can focus on improving defensive strategies to enhance their scoring opportunities, ultimately leading to more wins. The model helps confirm that better defensive rebounding can lead to increased points, validating the importance of defensive efforts in overall team performance.
Based on the analysis, several key strategies can be identified for NBA teams aiming for success:
Effective Defensive Rebounding: The analysis indicates a significant relationship between defensive rebounds and points scored. Teams should focus on improving their defensive rebounding skills to create more scoring opportunities.
Consistent Performance at Home and Away: The Celtics’ performance analysis shows the importance of maintaining consistency in both home and away games. Teams should develop strategies to perform well regardless of the location.
Utilizing Advanced Statistics: Advanced metrics like the ones used in this report can provide deeper insights into team performance. Teams should incorporate such analyses into their strategy development.
This comprehensive analysis of NBA team statistics provides valuable insights into the factors contributing to team success. By focusing on key metrics such as defensive rebounds and performance consistency, teams can refine their strategies and improve their chances of winning championships. The use of interactive and spatial visualizations further enhances the understanding of these metrics, making the analysis accessible and engaging for both analysts and fans.